Analysis of Twitter, user: @dog_rates (WeRateDogs)

We are analyzing data from a user of twitter, WeRateDogs. The activity of this user is to publish pictures of cute dogs tweets with ratings, we are not taking the rating too seriously and will focus on more objective metrics such as retweet_count and favorite_count to evaluate quality of a picture.

In [1]:
import pandas as pd
import numpy as np

%matplotlib inline
import matplotlib.pyplot as plt

import altair as alt

pd.options.display.max_colwidth = 2500
In [2]:
#read data
df_fullmerge = pd.read_csv("data/twitter_archive_master.csv", )
df_fullmerge.T
Out[2]:
0 1 2 3 4 5 6 7 8 9 ... 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355
Unnamed: 0 0 1 2 3 4 5 6 7 8 9 ... 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355
tweet_id 892420643555336193 892177421306343426 891815181378084864 891689557279858688 891327558926688256 891087950875897856 890971913173991426 890729181411237888 890609185150312448 890240255349198849 ... 666058600524156928 666057090499244032 666055525042405380 666051853826850816 666050758794694657 666049248165822465 666044226329800704 666033412701032449 666029285002620928 666020888022790149
timestamp 2017-08-01 16:23:56 +0000 2017-08-01 00:17:27 +0000 2017-07-31 00:18:03 +0000 2017-07-30 15:58:51 +0000 2017-07-29 16:00:24 +0000 2017-07-29 00:08:17 +0000 2017-07-28 16:27:12 +0000 2017-07-28 00:22:40 +0000 2017-07-27 16:25:51 +0000 2017-07-26 15:59:51 +0000 ... 2015-11-16 01:01:59 +0000 2015-11-16 00:55:59 +0000 2015-11-16 00:49:46 +0000 2015-11-16 00:35:11 +0000 2015-11-16 00:30:50 +0000 2015-11-16 00:24:50 +0000 2015-11-16 00:04:52 +0000 2015-11-15 23:21:54 +0000 2015-11-15 23:05:30 +0000 2015-11-15 22:32:08 +0000
source iphone iphone iphone iphone iphone iphone iphone iphone iphone iphone ... iphone iphone iphone iphone iphone iphone iphone iphone iphone iphone
text This is Phineas. He's a mystical boy. Only ever appears in the hole of a donut. 13/10 https://t.co/MgUWQ76dJU This is Tilly. She's just checking pup on you. Hopes you're doing ok. If not, she's available for pats, snugs, boops, the whole bit. 13/10 https://t.co/0Xxu71qeIV This is Archie. He is a rare Norwegian Pouncing Corgo. Lives in the tall grass. You never know when one may strike. 12/10 https://t.co/wUnZnhtVJB This is Darla. She commenced a snooze mid meal. 13/10 happens to the best of us https://t.co/tD36da7qLQ This is Franklin. He would like you to stop calling him "cute." He is a very fierce shark and should be respected as such. 12/10 #BarkWeek https://t.co/AtUZn91f7f Here we have a majestic great white breaching off South Africa's coast. Absolutely h*ckin breathtaking. 13/10 (IG: tucker_marlo) #BarkWeek https://t.co/kQ04fDDRmh Meet Jax. He enjoys ice cream so much he gets nervous around it. 13/10 help Jax enjoy more things by clicking below\n\nhttps://t.co/Zr4hWfAs1H https://t.co/tVJBRMnhxl When you watch your owner call another dog a good boy but then they turn back to you and say you're a great boy. 13/10 https://t.co/v0nONBcwxq This is Zoey. She doesn't want to be one of the scary sharks. Just wants to be a snuggly pettable boatpet. 13/10 #BarkWeek https://t.co/9TwLuAGH0b This is Cassie. She is a college pup. Studying international doggo communication and stick theory. 14/10 so elegant much sophisticate https://t.co/t1bfwz5S2A ... Here is the Rand Paul of retrievers folks! He's probably good at poker. Can drink beer (lol rad). 8/10 good dog https://t.co/pYAJkAe76p My oh my. This is a rare blond Canadian terrier on wheels. Only $8.98. Rather docile. 9/10 very rare https://t.co/yWBqbrzy8O Here is a Siberian heavily armored polar bear mix. Strong owner. 10/10 I would do unspeakable things to pet this dog https://t.co/rdivxLiqEt This is an odd dog. Hard on the outside but loving on the inside. Petting still fun. Doesn't play catch well. 2/10 https://t.co/v5A4vzSDdc This is a truly beautiful English Wilson Staff retriever. Has a nice phone. Privileged. 10/10 would trade lives with https://t.co/fvIbQfHjIe Here we have a 1949 1st generation vulpix. Enjoys sweat tea and Fox News. Cannot be phased. 5/10 https://t.co/4B7cOc1EDq This is a purebred Piers Morgan. Loves to Netflix and chill. Always looks like he forgot to unplug the iron. 6/10 https://t.co/DWnyCjf2mx Here is a very happy pup. Big fan of well-maintained decks. Just look at that tongue. 9/10 would cuddle af https://t.co/y671yMhoiR This is a western brown Mitsubishi terrier. Upset about leaf. Actually 2 dogs here. 7/10 would walk the shit out of https://t.co/r7mOb2m0UI Here we have a Japanese Irish Setter. Lost eye in Vietnam (?). Big fan of relaxing on stair. 8/10 would pet https://t.co/BLDqew2Ijj
expanded_urls https://twitter.com/dog_rates/status/892420643555336193/photo/1 https://twitter.com/dog_rates/status/892177421306343426/photo/1 https://twitter.com/dog_rates/status/891815181378084864/photo/1 https://twitter.com/dog_rates/status/891689557279858688/photo/1 https://twitter.com/dog_rates/status/891327558926688256/photo/1 https://twitter.com/dog_rates/status/891087950875897856/photo/1 https://gofundme.com/ydvmve-surgery-for-jax https://twitter.com/dog_rates/status/890729181411237888/photo/1 https://twitter.com/dog_rates/status/890609185150312448/photo/1 https://twitter.com/dog_rates/status/890240255349198849/photo/1 ... https://twitter.com/dog_rates/status/666058600524156928/photo/1 https://twitter.com/dog_rates/status/666057090499244032/photo/1 https://twitter.com/dog_rates/status/666055525042405380/photo/1 https://twitter.com/dog_rates/status/666051853826850816/photo/1 https://twitter.com/dog_rates/status/666050758794694657/photo/1 https://twitter.com/dog_rates/status/666049248165822465/photo/1 https://twitter.com/dog_rates/status/666044226329800704/photo/1 https://twitter.com/dog_rates/status/666033412701032449/photo/1 https://twitter.com/dog_rates/status/666029285002620928/photo/1 https://twitter.com/dog_rates/status/666020888022790149/photo/1
name Phineas Tilly Archie Darla Franklin None Jax None Zoey Cassie ... the a a an a None a a a None
dog_stage Unknown Unknown Unknown Unknown Unknown Unknown Unknown Unknown Unknown doggo ... Unknown Unknown Unknown Unknown Unknown Unknown Unknown Unknown Unknown Unknown
rating 130 130 120 130 120 130 130 130 130 140 ... 80 90 100 20 100 50 60 90 70 80
names Phineas Tilly Archie Darla Franklin None Jax None Zoey Cassie ... Unknown Unknown Unknown Unknown Unknown None Unknown Unknown Unknown None
date 2017-08-01 2017-08-01 2017-07-31 2017-07-30 2017-07-29 2017-07-29 2017-07-28 2017-07-28 2017-07-27 2017-07-26 ... 2015-11-16 2015-11-16 2015-11-16 2015-11-16 2015-11-16 2015-11-16 2015-11-16 2015-11-15 2015-11-15 2015-11-15
time 16:23:56 00:17:27 00:18:03 15:58:51 16:00:24 00:08:17 16:27:12 00:22:40 16:25:51 15:59:51 ... 01:01:59 00:55:59 00:49:46 00:35:11 00:30:50 00:24:50 00:04:52 23:21:54 23:05:30 22:32:08
hour 16 0 0 15 16 0 16 0 16 15 ... 1 0 0 0 0 0 0 23 23 22
day 1 1 31 30 29 29 28 28 27 26 ... 16 16 16 16 16 16 16 15 15 15
month 8 8 7 7 7 7 7 7 7 7 ... 11 11 11 11 11 11 11 11 11 11
year 2017 2017 2017 2017 2017 2017 2017 2017 2017 2017 ... 2015 2015 2015 2015 2015 2015 2015 2015 2015 2015
calmonth 8-2017 8-2017 7-2017 7-2017 7-2017 7-2017 7-2017 7-2017 7-2017 7-2017 ... 11-2015 11-2015 11-2015 11-2015 11-2015 11-2015 11-2015 11-2015 11-2015 11-2015
day_of_week Tuesday Tuesday Monday Sunday Saturday Saturday Friday Friday Thursday Wednesday ... Monday Monday Monday Monday Monday Monday Monday Sunday Sunday Sunday
retweet_count NaN 5532 NaN 7620 NaN 2752 NaN 16651 NaN 6456 ... 51 NaN 213 NaN 51 NaN 123 NaN 41 NaN
favorite_count NaN 30536 NaN 38567 NaN 18582 NaN 59454 NaN 29178 ... 103 NaN 402 NaN 119 NaN 263 NaN 118 NaN
jpg_url https://pbs.twimg.com/media/DGKD1-bXoAAIAUK.jpg https://pbs.twimg.com/media/DGGmoV4XsAAUL6n.jpg https://pbs.twimg.com/media/DGBdLU1WsAANxJ9.jpg https://pbs.twimg.com/media/DF_q7IAWsAEuuN8.jpg https://pbs.twimg.com/media/DF6hr6BUMAAzZgT.jpg https://pbs.twimg.com/media/DF3HwyEWsAABqE6.jpg https://pbs.twimg.com/media/DF1eOmZXUAALUcq.jpg https://pbs.twimg.com/media/DFyBahAVwAAhUTd.jpg https://pbs.twimg.com/media/DFwUU__XcAEpyXI.jpg https://pbs.twimg.com/media/DFrEyVuW0AAO3t9.jpg ... https://pbs.twimg.com/media/CT5Qw94XAAA_2dP.jpg https://pbs.twimg.com/media/CT5PY90WoAAQGLo.jpg https://pbs.twimg.com/media/CT5N9tpXIAAifs1.jpg https://pbs.twimg.com/media/CT5KoJ1WoAAJash.jpg https://pbs.twimg.com/media/CT5Jof1WUAEuVxN.jpg https://pbs.twimg.com/media/CT5IQmsXIAAKY4A.jpg https://pbs.twimg.com/media/CT5Dr8HUEAA-lEu.jpg https://pbs.twimg.com/media/CT4521TWwAEvMyu.jpg https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpg https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg
dog_breed Unknown Chihuahua Chihuahua Labrador_retriever basset Chesapeake_Bay_retriever Appenzeller Pomeranian Irish_terrier Pembroke ... miniature_poodle golden_retriever chow Unknown Bernese_mountain_dog miniature_pinscher Rhodesian_ridgeback German_shepherd redbone Welsh_springer_spaniel

22 rows × 2356 columns

1. When are most of the tweets published?

In [3]:
df_fullmerge["ones"] = "1"

heat_count = alt.Chart(df_fullmerge).mark_rect().encode(
    alt.X('hours(timestamp):O', title='hour of day'),
    alt.Y('yearmonth(timestamp):O', title='date'),
    alt.Color('sum(ones):Q', title='Count of tweets'),
    alt.Tooltip(['yearmonth(timestamp)','hours(timestamp)','count(ones)'])
)


heat_count
Out[3]:

All the tweets that we are analyzing were published between Nov 2015 and Aug 2017.

Particularly most of the tweets were published between Nov 2015 and Jan 2016 during 01:00-05:00 and 17:00-19:00.

The Month and hour with most tweets has been Dec 2015 at 04:00 having 59 tweets.

It is interesting to notice that the amount of acumulated retweets is not liked with the amount of tweets, the tweets that had most virality were published in Jan 2017 at 3:00 (with 81008 retweets acumulated) and Jan 2016 at 20:00 (with 86289 retweets acumulated).

In [4]:
heat_retweet = alt.Chart(df_fullmerge).mark_rect().encode(
    alt.X('hours(timestamp):O', title='hour of day'),
    alt.Y('yearmonth(timestamp):O', title='date'),
    alt.Color('sum(retweet_count):Q', title='Count of content retweets'),
    alt.Tooltip(['yearmonth(timestamp)','hours(timestamp)','sum(retweet_count)'])
)

heat_retweet
Out[4]:

3. What is the distribution of retweets for each published tweet?

In [8]:
x = df_fullmerge["retweet_count"]

num_bins = 50

fig, ax = plt.subplots()
fig.set_size_inches(12,5)

# the histogram of the data
n, bins, patches = ax.hist(x, num_bins, density=0)


ax.set_xlabel('Retweet Count')
ax.set_ylabel('Count Tweets')
ax.set_title('Histogram of Retweet Count')


plt.show()

Most of the tweets have a low number number of retweets, there are very few retweets that go viral but when they do the impact on Retweet Count is huge.

4. What are the dog_breeds with more acumulated retweets?

  • Golden Retriever (173 tweets)
  • Labrador Retriever (113 tweets)
  • Chihuahua (95 tweets)

Are the top three informed dog breeds in our dataset.

In [6]:
import altair as alt
from vega_datasets import data

source = df_fullmerge

alt.Chart(source).mark_bar().encode(
    y='sum(retweet_count):Q',
    tooltip = ['dog_breed','sum(retweet_count)', 'count(ones)'],
    x=alt.X('dog_breed:N', sort='-y')
)
Out[6]:

Which are the most retweeted dog breeds over time?

Virality does not seem to be related to the dog breed, we can find outliers in any dog breed that went viral, for example labrador retriever (Jun 2016, 75153) outperformed by far all the other months for the same dog_breed.

Furthermore there are less published dog breeds like Eskimo dog (Jun 2016, 55956) or standard poodle (Jan 2017, 36372 retweets) that are rarely published but when they go viral they perform really well

In [7]:
import altair as alt
from vega_datasets import data

source = df_fullmerge

alt.Chart(source).mark_circle(
    opacity=0.8,
    stroke='black',
    strokeWidth=1
).encode(
    alt.X('yearmonth(timestamp):O', axis=alt.Axis(labelAngle=0)),
    alt.Y('dog_breed:N'),
    alt.Size('retweet_count:Q',
        scale=alt.Scale(range=[0, 4000]),
        legend=alt.Legend(title='Annual Global Deaths')
    ),
    alt.Tooltip(['dog_breed','yearmonth(timestamp)','names','tweet_id:N','retweet_count']),
    alt.Color('dog_breed:N', legend=None)
).properties(
    width=500,
    height=1000
).transform_filter(
    alt.datum.Entity != 'All natural disasters'
)
Out[7]: